{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building ADML From Scratch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the last section we saw how ADML works. We saw how we train our model with both clean and adversarial samples to obtain a better and robust model parameter $\\theta$ that is generalizabel across tasks. \n",
    "\n",
    "\n",
    "Now we will better understand ADML by coding them from scratch. For better understanding, we consider a simple binary classification task. We randomly generate our input data and we train them with a single layer neural network and try to find the optimal parameter $\\theta$. \n",
    "\n",
    "Now we will step by step how exacly we are doing this,\n",
    "\n",
    "First we import all the necessary libraries,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Clean Samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we define a function called sample_points for generating clean input (x,y) pairs. It takes the parameter k as an input which implies number of (x,y) pairs we want to sample. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sample_points(k):\n",
    "    x = np.random.rand(k,50)\n",
    "    y = np.random.choice([0, 1], size=k, p=[.5, .5]).reshape([-1,1])\n",
    "    return x,y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The above function returns output as follows, "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.88181038 0.51678418 0.81979321 0.32583704 0.12557307 0.30478364\n",
      " 0.96272806 0.00636423 0.0375513  0.05887383 0.52036732 0.66574474\n",
      " 0.45427606 0.28874493 0.19352738 0.44417903 0.63303766 0.93938007\n",
      " 0.05098809 0.10348272 0.1037544  0.59786084 0.64339621 0.07255698\n",
      " 0.3412259  0.31813905 0.20525812 0.20301659 0.90735178 0.14418036\n",
      " 0.57155755 0.53092418 0.62812951 0.96301036 0.75604885 0.01451255\n",
      " 0.21633051 0.34509995 0.57552133 0.82287969 0.7935126  0.50339959\n",
      " 0.62920624 0.6642846  0.41366068 0.12444433 0.09815865 0.15425461\n",
      " 0.43131473 0.64740023]\n",
      "[0]\n"
     ]
    }
   ],
   "source": [
    "x, y = sample_points(10)\n",
    "print x[0]\n",
    "print y[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Adversarial Samples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "Now, we define one more function called FGSM for generating adversarial inputs. We use Fast Gradient Sign Method (FGSM) for generating adversarial samples. We have seen how FGSM generates the adversarial pairs by calculating gradients with respect to the inputs instead of model parameter. So, We take clean (x,y) pairs as an input and generate adversarial (x_adv,y) pairs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def FGSM(x,y):\n",
    "\n",
    "    #placeholder for the inputs x and y\n",
    "    X = tf.placeholder(tf.float32)\n",
    "    Y = tf.placeholder(tf.float32)\n",
    "\n",
    "    #initialize theta with random values\n",
    "    theta = tf.Variable(tf.zeros([50,1]))\n",
    "\n",
    "    #predict the value of y\n",
    "    YHat = tf.nn.softmax(tf.matmul(X, theta)) \n",
    "\n",
    "    #calculate the loss\n",
    "    loss = tf.reduce_mean(-tf.reduce_sum(Y*tf.log(YHat), reduction_indices=1))\n",
    " \n",
    "    #now calculate gradient of our loss function with respect to our input X instead of model parameter theta\n",
    "    gradient = ((tf.gradients(loss,X)[0]))\n",
    "    \n",
    "    #calculate the adversarial input\n",
    "    #i.e x_adv = x + epsilon * sign ( nabla_x J(X, Y))\n",
    "    X_adv = X + 0.2*tf.sign(gradient)\n",
    "    X_adv = tf.clip_by_value(X_adv,-1.0,1.0)\n",
    "\n",
    "    #start the tensoflow session\n",
    "    with tf.Session() as sess:\n",
    "\n",
    "        sess.run(tf.global_variables_initializer())        \n",
    "        X_adv = sess.run(X_adv, feed_dict={X: x, Y: y})\n",
    "        \n",
    "    return X_adv, y"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X_adv , Y = FGSM(x,y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Single Layer Neural Network"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use a neural network with a single layer for predicting the output.\n",
    "\n",
    "a = np.matmul(X, theta)\n",
    "\n",
    "\n",
    "YHat = sigmoid(a)\n",
    "\n",
    "So, we use ADML for finding this optimal parameter value theta that is generalizable across tasks. So for a new task, we can learn from a few data points in a lesser time by taking very less gradient steps."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Adversarial Meta Learning (ADML)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we define a class called ADML where we implement the ADML algorithm. In the \\__init__  method we will initialize all the necessary variables. Then we define our sigmoid function. Followed by we define our train function. \n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can check the comments written above each line of code for understanding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "class ADML(object):\n",
    "    def __init__(self):\n",
    "\n",
    "        #initialize number of tasks i.e number of tasks we need in each batch of tasks\n",
    "        self.num_tasks = 2\n",
    "        \n",
    "        #number of samples i.e number of shots  -number of data points (k) we need to have in each task\n",
    "        self.num_samples = 10\n",
    "\n",
    "        #number of epochs i.e training iterations\n",
    "        self.epochs = 100\n",
    "        \n",
    "        #hyperparameter for the inner loop (inner gradient update)\n",
    "\n",
    "        #for clean sample\n",
    "        self.alpha1 = 0.0001\n",
    "\n",
    "        #for adversarial sample\n",
    "        self.alpha2 = 0.0001\n",
    "        \n",
    "        #hyperparameter for the outer loop (outer gradient update) i.e meta optimization\n",
    "        \n",
    "        #for clean sample\n",
    "        self.beta1 = 0.0001\n",
    "        \n",
    "        #for adversarial sample\n",
    "        self.beta2 = 0.0001\n",
    "\n",
    "        #randomly initialize our model parameter theta\n",
    "        self.theta = np.random.normal(size=50).reshape(50, 1)\n",
    "\n",
    "    #define our sigmoid activation function  \n",
    "    def sigmoid(self,a):\n",
    "        return 1.0 / (1 + np.exp(-a))\n",
    "    \n",
    "    #now let us get to the interesting part i.e training :P\n",
    "    def train(self):\n",
    "        \n",
    "        #for the number of epochs,\n",
    "        for e in range(self.epochs):        \n",
    "            \n",
    "            #theta' of clean samples\n",
    "            self.theta_clean = []\n",
    "\n",
    "            #theta' of adversarial samples\n",
    "            self.theta_adv = []\n",
    "            \n",
    "            #for task i in batch of tasks\n",
    "            for i in range(self.num_tasks):\n",
    "                \n",
    "                #sample k data points and prepare our training data\n",
    "\n",
    "                #first, we sample clean data points\n",
    "                XTrain_clean, YTrain_clean = sample_points(self.num_samples)\n",
    "\n",
    "                #feed the clean samples to FGSM and get adversarial samples\n",
    "                XTrain_adv, YTrain_adv = FGSM(XTrain_clean,YTrain_clean)\n",
    "\n",
    "                #1. First, we computer theta' for clean samples and store it in theta_clean\n",
    "                \n",
    "                #predict the output y \n",
    "                a = np.matmul(XTrain_clean, self.theta)\n",
    "\n",
    "                YHat = self.sigmoid(a)\n",
    "\n",
    "                #since we are performing classification, we use cross entropy loss as our loss function\n",
    "                loss = ((np.matmul(-YTrain_clean.T, np.log(YHat)) - np.matmul((1 -YTrain_clean.T), np.log(1 - YHat)))/self.num_samples)[0][0]\n",
    "                \n",
    "                #minimize the loss by calculating gradients\n",
    "                gradient = np.matmul(XTrain_clean.T, (YHat - YTrain_clean)) / self.num_samples\n",
    "\n",
    "                #update the gradients and find the optimal parameter theta' for clean samples\n",
    "                self.theta_clean.append(self.theta - self.alpha1*gradient)\n",
    "              \n",
    "                #2. Now, we compute theta' for adversarial samples and store it in theta_clean\n",
    "\n",
    "                #predict the output y \n",
    "                a = (np.matmul(XTrain_adv, self.theta))\n",
    "\n",
    "                YHat = self.sigmoid(a)\n",
    "\n",
    "                #calculate cross entropy loss\n",
    "                loss = ((np.matmul(-YTrain_adv.T, np.log(YHat)) - np.matmul((1 -YTrain_adv.T), np.log(1 - YHat)))/self.num_samples)[0][0]\n",
    "                \n",
    "                #minimize the loss by calculating gradients\n",
    "                gradient = np.matmul(XTrain_adv.T, (YHat - YTrain_adv)) / self.num_samples\n",
    "\n",
    "                #update the gradients and find the optimal parameter theta' for adversarial samples\n",
    "                self.theta_adv.append(self.theta - self.alpha2*gradient)\n",
    "                \n",
    "            #initialize meta gradients for clean samples\n",
    "            meta_gradient_clean = np.zeros(self.theta.shape)\n",
    "\n",
    "            #initialize meta gradients for adversarial samples\n",
    "            meta_gradient_adv = np.zeros(self.theta.shape)\n",
    "            \n",
    "            for i in range(self.num_tasks):\n",
    "                \n",
    "                #sample k data points and prepare our test set for meta training\n",
    "\n",
    "                #first, we sample clean data points\n",
    "                XTest_clean, YTest_clean = sample_points(self.num_samples)\n",
    "\n",
    "                #feed the clean samples to FGSM and get adversarial samples\n",
    "                XTest_adv, YTest_adv = FGSM(XTest_clean,YTest_clean)\n",
    "                       \n",
    "                #1. First, we computer meta gradients for clean samples \n",
    "\n",
    "                #predict the value of y\n",
    "                a = np.matmul(XTest_clean, self.theta_clean[i])\n",
    "                \n",
    "                YPred = self.sigmoid(a)\n",
    "                           \n",
    "                #compute meta gradients\n",
    "                meta_gradient_clean += np.matmul(XTest_clean.T, (YPred - YTest_clean)) / self.num_samples\n",
    "\n",
    "                #2. Now, we compute meta gradients for adversarial samples\n",
    "                \n",
    "                #predict the value of y\n",
    "                a = (np.matmul(XTest_adv, self.theta_adv[i]))\n",
    "                \n",
    "                YPred = self.sigmoid(a)\n",
    "                           \n",
    "                #compute meta gradients\n",
    "                meta_gradient_adv += np.matmul(XTest_adv.T, (YPred - YTest_adv)) / self.num_samples\n",
    "\n",
    "            #update our randomly initialized model parameter theta\n",
    "            #with the meta gradients of both clean and adversarial samples\n",
    "            \n",
    "            self.theta = self.theta-self.beta1*meta_gradient_clean/self.num_tasks\n",
    "\n",
    "            self.theta = self.theta-self.beta2*meta_gradient_adv/self.num_tasks\n",
    "                                  \n",
    "            if e%10==0:\n",
    "                print \"Epoch {}: Loss {}\\n\".format(e,loss)             \n",
    "                print 'Updated Model Parameter Theta\\n'\n",
    "                print 'Sampling Next Batch of Tasks \\n'\n",
    "                print '---------------------------------\\n'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model = ADML()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch 0: Loss 1.06869670787\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 10: Loss 2.20820318343\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 20: Loss 0.556802980902\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n",
      "Epoch 30: Loss 1.16247761064\n",
      "\n",
      "Updated Model Parameter Theta\n",
      "\n",
      "Sampling Next Batch of Tasks \n",
      "\n",
      "---------------------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "model.train()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
